core.experimental.memory module with D replacement for memcpy added#2687
core.experimental.memory module with D replacement for memcpy added#2687baziotis wants to merge 13 commits intodlang:masterfrom baziotis:Dmemcpy
Conversation
| @@ -0,0 +1,170 @@ | |||
| /** | |||
There was a problem hiding this comment.
Should this be in core.experimental.memory? Wouldn't core.experimental make more sense?
There was a problem hiding this comment.
It depends on how you look at it. It was proposed here: #2671 (comment)
Which makes sense if the SIMD operations are only memory operations (which they are currently). Eventually, I hope that this will become a bigger SIMD library and be moved out to experimental (or wherever) because D needs one that is able to cross-compile. At least I don't know of one and so I made this.
Perhaps make two PRs but include the SIMD commits in the |
Yes, that's an alternative. But then if changes happen in this or the other PR, then the changes have to be made on both PRs. Ideally, first the PR on SIMD would be made, then this, but it was done to save time. It makes some sense to be here, as the |
thewilsonator
left a comment
There was a problem hiding this comment.
Please clean up your notes and TODOs
|
|
||
| /* Handle dynamic sizes. `d` and `s` must not overlap. | ||
| */ | ||
| void Dmemcpy(void* d, const(void)* s, size_t n) |
There was a problem hiding this comment.
That is public because of the description of this PR:
This is a D replacement for the C Standard Library memcpy. It provides 2 interfaces:
The "D" one, which takes, one of ref, dynamic arrays or static arrays.
The "C" one, via the name Dmemcpy (for disambiguation with the libc), which takes the classic:(void*, void*, size_t).
So, because the idea was to provide the C interface as well. Is this desired?
Otherwise, yes it should be private.
There was a problem hiding this comment.
Isn't that the C version then? I think all bases should be covered with static arrays and dynamic arrays, you can always slice a pointer to create a dynamic array from it and a length..
There was a problem hiding this comment.
Yes, it is the C version (meaning, with the C interface). It is named Dmemcpy for disambiguation with the libc memcpy.
We changed the Dmemcpy name to memcpy for the D interface, which provides 3 overloads, the ones you can see on top of the file here, named memcpy. I guess this was done with the purpose that they are overloads for the libc memcpy. They provide handling for refs, static arrays and dynamic arrays.
But I thought it would be good to also provide a C interface but for the D version (aka the druntime version) named Dmemcpy, that is different from the libc.
There was a problem hiding this comment.
it's not supported on all platforms.
Where is it not supported?
D just doesn't have a standard for it... but that's very possible to fix.
There was a problem hiding this comment.
Where is it not supported?
AFAIK MSVC (or whatever object format that Windows uses, COFF/PE?) doesn't support weak linkage, but I might be wrong.
D just doesn't have a standard for it... but that's very possible to fix.
I agree, would be great to have this.
There was a problem hiding this comment.
but I might be wrong.
Yeah. I'm pretty sure weak linkage is available everywhere, and it might be particularly useful in this project.
I wonder if there's any reason that we don't have a weak attribute already? I think I've asked for it in the past and it was denied because reasons... I wonder what they were :/
There was a problem hiding this comment.
Function should still be private or have a clear confusion free name.
There was a problem hiding this comment.
Yeah. I'm pretty sure weak linkage is available everywhere
The accepted answer and the most up-voted answer on StackOverflow[1] suggest that at least the semantics are different with MSVC. Other compilers/linkers might implement workarounds though, I don't have access to a Windows machine to test on.
|
I think you need to add this to the doc makefile. |
|
Isn't the doc makefile the |
Yes, but new (sub)directories have to be added in |
|
Thanks Les! |
| const(void)* s = cast(const(void)*) src.ptr; | ||
| size_t n = dst.length * typeof(dst[0]).sizeof; | ||
| // Assume that there is no overlap. | ||
| pragma(inline, true); |
There was a problem hiding this comment.
What? It shows me 4 lines. Please point me to the line.
There was a problem hiding this comment.
The one I tagged... pragma(inline, true);
That's a weird statement, what does it mean? What does it apply to?
There was a problem hiding this comment.
As far as I know, pragma(inline, true) instructs the compiler to inline the function call that follows.
There was a problem hiding this comment.
as a stand-alone statement on its own hanging there? What does it apply to?
Can you show the doco where you read that? If that exists I'll be very surprised (I've been asking for it for many years).
There was a problem hiding this comment.
It has effect, but not the effect I said. My mistake: https://dlang.org/spec/pragma.html#inline
It inlines the function it is enclosed by.
There was a problem hiding this comment.
Oh wow. I had no idea you could do that. That's very weird.
There was a problem hiding this comment.
It seems weird to me as well..
| void memcpy(T, size_t len)(ref T[len] dst, ref const T[len] src) | ||
| { | ||
| T[] d = dst[0 .. $]; | ||
| const T[] s = src[0 .. $]; |
There was a problem hiding this comment.
Why write these on separate lines?
There was a problem hiding this comment.
d and s? How else could they be?
There was a problem hiding this comment.
memcpy(dst[0 .. $], src[0 .. $]);
I only say because I stopped and re-read these lines carefully looking for the gotcha I assumed must exist to justify this long-written version. To me, writing it out this way makes it look like there's unexpected detail that should be distinctly stated... maybe that's just me though. I was looking for the substance in the noise, and there wasn't any.
There was a problem hiding this comment.
I agree. And yes, there's no gotcha.
There was a problem hiding this comment.
I agree that writing it in one line expresses the intent clearer.
| { | ||
| ubyte* dst = cast(ubyte*) d; | ||
| const(ubyte)* src = cast(const(ubyte)*) s; | ||
| for (size_t i = 0; i != n; ++i) |
There was a problem hiding this comment.
Why isn't this at least moving ulong's?
There was a problem hiding this comment.
Your comment made me think, thanks.
I could use the modified GNU algorithm: https://www.embedded.com/print/4024961
but to me this only complicates the code without any real gain. The reason is that it only tests if they're aligned and doesn't reach alignment. To do a more serious 4-byte or 8-byte move, one has reach alignment etc. like this: https://github.com/dlang/druntime/pull/2662/files#diff-b2f13b9afdcf6d39739b2bf55035f8e2R166
The reason I did not do either of those is that the compiler can do this and other things itself. It should be able to understand that moving either 4 or 8 bytes at a time is faster, reach alignment, even add SIMD. For example
LDC: https://godbolt.org/z/2V1Wl8
GDC: https://godbolt.org/z/XXzLUQ (GDC needs -O3 to do the crazy stuff).
Here, the compiler does all of the above. It even added something like a switch.
But, here's the part I hadn't tested. If on LDC we pass the -disable-loop-vectorization
LDC: https://godbolt.org/z/WVVJlz
or we specify an architecture that does not support SIMD, like let's say pentium2.
LDC: https://godbolt.org/z/Q1vqOS
then it doesn't do 4-byte moves (It should be 4 bytes there since it is 32-bit compilation).
The same is true for GDC: https://godbolt.org/z/PK8YDQ
So, I don't know if it pays to reach alignment but definetely I will use the GNU algorithm. It adds value without complicating the code much.
There was a problem hiding this comment.
You shouldn't be depending on optimisations that DMD can't do.
And LDC/GDC should be using intrinsics exclusively, definitely not calling any of these functions. Taking that knowledge away from the optimiser may create some very lame copies in optimised code.
There was a problem hiding this comment.
DMD was the initial focus. Then, it changed LDC and GDC so pretty much, what DMD does became irrelevant. Otherwise, the code would be way different. This is not a hypothesis. I spent one month writing DMD versions of these functions that had to reach the libc ones. They were completely different from the current ones due to the optimizations that DMD could not do.
Meaning, if LDC and GDC use their intrinsics, this project becomes irrelevant. But to use their intrinsics, the above question should be answered, which is what LDC and GDC do when libc is not available.
For me, the logical move is to inform the optimizer that e.g. our memset is supposed to do what memset does. Which enables us to write our own versions of these functions (aka libc independency) and not lose the optimization info.
There was a problem hiding this comment.
which is what LDC and GDC do when libc is not available.
There's a difference between a call to a function called mempy (which is hooked by an intrinsic), and a symbol called memcpy in a lib somewhere. You're writing the latter, but the former is what appears in the code. LLVM/GCC intrinsics don't always emit a call to a memcpy function, they can do anything they like. They might emit a call to a memcpy function, and if they do, it would be an advantage to understand in what context that can occur.
I don't know the answer here; I haven't looked into this, it might be that they just call the libc memcpy when they decide not to do anything special; or they might call into their own small runtime libraries? Perhaps their runtime libraries have a small suite of such functions optimised for different cases? I don't know, but anything's possible here.
All the compilers have a runtime library which is a lower-level library than libc. I'm not sure if those libraries contain memcpy/memset's for when the intrinsics bail out... or if they all just leave it to the libc implementation? If the intrinsic's fall-back functions are in the runtime library and not taken from libc, then we already have libc independence in those cases.
Anyway, I can see a world where it may be useful to have these functions for when the intrinsic does bail-out, but I think it's important to understand the implementation details of each compilers intrinsics to understand what they expect from these functions and the conditions when they're called.
You'll never be able to write optimised implementations for all platforms (or even x86, because it's a moving target); so I feel like the most interesting victory here is for DMD.
I'm sorry, I don't really know anything about this project you're working on, or the parameters for completion... but I think tackling memcpy/memset are probably lower priority when considering libc independence, and possibly the 2 most highly contentious items on the list.
There was a problem hiding this comment.
Meaning, if LDC and GDC use their intrinsics, this project becomes irrelevant.
Just to be clear; we should assume they are, and do absolutely everything we must to make sure we don't interfere with that. The case for us to consider here is when the intrinsics decide not to do anything special and fall-back to a lib-call... what functions do they call when they bail out? Is it libc memcpy/set, or something else customised for their runtimes?
There was a problem hiding this comment.
Musl may be a good reference too, they have optimized C versions, that simply translate in D (and gdc/ldc would generate equivalent code to C).
https://github.com/bminor/musl/blob/master/src/string/memcpy.c
https://github.com/bminor/musl/blob/master/src/string/memset.c
There was a problem hiding this comment.
@TurkeyMan I agree with everything you wrote. You see, when this project started (or when it moved to LDC / GDC), I didn't know any of these proeblems. It was Johan's comment (about a week ago) that made me realize "Oh, not using the intrinsics is like a really bad thing". I'm not sure how to move forward..
There was a problem hiding this comment.
@ibuclaw Yes, they're good reference, thanks. It was one of the early codes I was referred to. Generally though, Agner's implementations are supposed to be the fastest around (although that's not exactly true compared to the libc. This could be a very long discussion). He's considered a master in this field.
Other than that, unfortunately, I don't think that the main problems are with the code but with the goals.
wilzbach
left a comment
There was a problem hiding this comment.
- move traits in a separate PR into druntime, but with all tests. Then you can replace their Phobos implementation with a reference (we have switched to this paradigm a while ago. Applies to the old traits as well, but no one got around doing that, but at least no stuff shouldn't be duplicated)
- lots of easy style improvements (foreach, one empty line between function)
- attributes for unittests (especially the new
@betterC) - docs seem to be a bit outdated and comments in general need polishing. All Stefanos notes should ideally be resolved or converted in a neutral comment
| testStaticType!(S!32768); | ||
| testStaticType!(S!65536); | ||
| } | ||
| pragma(inline, false) |
There was a problem hiding this comment.
Remove. It doesn't do anything.
| testStaticType!(ulong); | ||
| testStaticType!(float); | ||
| testStaticType!(double); | ||
| testStaticType!(real); |
There was a problem hiding this comment.
You can use static foreach for types too.
There was a problem hiding this comment.
You mean with AliasSeq probably. The static foreach is a problem for GDC though. Do you want 2 versions?
There was a problem hiding this comment.
For testing, that makes sense. However, this module won't be backported to dmd-cxx, so you don't need to worry about backwards compatibility for older versions of D.
| testStaticType!(S!8192); | ||
| testStaticType!(S!16384); | ||
| testStaticType!(S!32768); | ||
| testStaticType!(S!65536); |
There was a problem hiding this comment.
Again, the GDC thing.
|
|
||
| /* Handle dynamic sizes. `d` and `s` must not overlap. | ||
| */ | ||
| void Dmemcpy(void* d, const(void)* s, size_t n) |
There was a problem hiding this comment.
Function should still be private or have a clear confusion free name.
| static string loop(string prefetchChoice)() | ||
| { | ||
| return | ||
| " |
There was a problem hiding this comment.
I can't find an easy way to interpolate the prefetchChoice with token strings.
There was a problem hiding this comment.
I can't somehow answer on the comment above, so:
Function should still be private or have a clear confusion free name
If it is private, then the whole discussion about providing a C-like has no meaning.
So, if you don't want a C-like interface then I make it private otherwise, what would be a confusion-free name?
There was a problem hiding this comment.
can't find an easy way to interpolate the prefetchChoice with token strings.
q{ ...} ~ ...
| * N.B.: Both the memcpy() here and Dmemcpy() return nothing, contrary to the C Standard | ||
| * Library version. | ||
| * Source: $(DRUNTIMESRC core/experimental/memory/memcpy.d) | ||
| */ |
There was a problem hiding this comment.
DStyle is without leading stars.
There was a problem hiding this comment.
I don't really get where to use the one and where the other.
For example, with leading stars: https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L277
Without leading stars:
https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L213
There was a problem hiding this comment.
If you're not sure, you can check the D Style document[1], which should always be followed for new code: "Documentation comments should not have leading stars on each line."
A lot of the code in druntime (and sadly also some in Phobos) is old and doesn't follow the style very closely or at all.
There was a problem hiding this comment.
Thanks, I'm trying to follow but things escape here and there.
There was a problem hiding this comment.
Yes there's lots of old code here and in Phobos that was never changed, but at least new code should follow the guidelines.
|
|
||
| /** | ||
| * Handle Static Types | ||
| * N.B.: No need for more sophisticated code for static types. The compiler |
There was a problem hiding this comment.
Use full sentences and no abbreviation. Have a look at std.algorithm for good examples on how to write good doc headers.
There was a problem hiding this comment.
I think it was a note and I was referred to a N.B. So, Note?
| import core.internal.traits : isArray; | ||
|
|
||
| /** | ||
| * Handle Static Types |
There was a problem hiding this comment.
This distinction isn't important to the user.
There was a problem hiding this comment.
You mean it should be just a // that is not part of the doc?
There was a problem hiding this comment.
No I meant that as you have both overloads defined, this is an implementation detail. Though as said on another point there's a case to simplify the interface even further, but it definitely shouldn't expose such details to the user.
There was a problem hiding this comment.
this is an implementation detail
Yes, so the//wouldn't be the right choice i.e. not be part of the doc?
On the merging of the interfaces there were problems that I could not solve: #2687 (comment)
| */ | ||
| pragma(inline, true) | ||
| void memcpy(T)(ref T dst, ref const T src) | ||
| if (!isArray!T) |
There was a problem hiding this comment.
Why not use this as the user facing API and delegate with static if? Reduces the exposed interfaces -> less confusion.
There was a problem hiding this comment.
Because for me it's more confusing with one interface. I'll change it.
There was a problem hiding this comment.
I have a problem that I think I had when I had tried to join in one interface all of them. Basically, somehow the arrays get confused. e.g. if I have this code:
void memcpy(T)(ref T dst, ref const T src) nothrow @nogc
{
static if (!isArray!T)
{
dst = src;
}
else
{
void memcpyArray(T)(ref T[] dst, ref const T[] src) nothrow @nogc
{
assert(dst.length == src.length);
void* d = cast(void*) dst.ptr;
const(void)* s = cast(const(void)*) src.ptr;
size_t n = dst.length * typeof(dst[0]).sizeof;
// Assume that there is no overlap.
Dmemcpy(d, s, n);
}
static if (isDynamicArray!T)
{
memcpyArray(dst, src);
}
else
{
memcpyArray(dst[0 .. $], src[0 .. $]);
}
}
}
| void memcpy(T, size_t len)(ref T[len] dst, ref const T[len] src) | ||
| { | ||
| T[] d = dst[0 .. $]; | ||
| const T[] s = src[0 .. $]; |
There was a problem hiding this comment.
I agree that writing it in one line expresses the intent clearer.
|
I have the one problem I mentioned in the comment about trying to merge all of the Also, this: produces a segfault. Other than those:
If I move them from here, then this won't work. Unless you mean keep them here but also make a different PR (and then a different PR to remove them from Phobos). Also, I can't seem to make Edit: The closure was by accident. |
|
A note about compiler intrinsics: LLVM has intrinsics for memset and memcpy, but one doesn't need to use them to get optimizations going. E.g. in code like this, no intrinsics are used: yet the optimizer recognizes (For the interested, the same works for allocation functions: see here how Rust adds to its LLVM fork the names of its own stdlib functions to the list of recognized memory functions: rust-lang/llvm-project@1f4c17d . Although for that, we can use the |
|
Thanks for the info Johan! As I said in a previous comment, when I started the implementation for LDC, GDC, I didn't know that: As I said in the comments above, sadly (very sadly, I have written this code one for DMD and one for LDC, GDC) that info pretty much makes this project irrelevant.
So, this project is worth it if:
So, well I don't know.. I'm open to your thoughts. Btw
Wow!
Interersting |
|
Closed due to not being useful - more info here. |
This is a D replacement for the C Standard Library
memcpy. It provides 2 interfaces:ref, dynamic arrays or static arrays.Dmemcpy(for disambiguation with the libc), which takes the classic(void*, void*, size_t).Since this function uses SIMD, I found convenient to PR a simple SIMD library, which I hope will be extended by the community. Its main purpose is to provide cross-compilation intrinsics for D.
Ideally, the
core.experimental.memory.simdwould be a different PR, but without it basically a PR formemcpycan't happen.The change in
core.internal.traitsis the same done here: #2662I have copied everything necessary from
std.traitsto haveisArrayavailable.There are a couple things I'm not sure about but are minor, so better wait for feedback.